import torch
import numpy as np

a = torch.tensor(range(6))
a = a.reshape(2, 3)
a.shape, a.stride()

(torch.Size([2, 3]), (3, 1))

Tensors:

We could define a Tensor as container for numerical data, arranged in regular grid, with a defined: shape and layout.
basically we have 4 levels of tensors:
- Scalar: just a number e.i 8 ==> 0D
- Vector: a 1D row of numbers [1, 2]
- Matrix: a 2D grid of rows & columns [[1, 2], [4, 1]]
- Tensor: any nD generalization of this 3D, 4D..

# scalar has 0 dimension:
scalar = torch.tensor(8)
scalar.ndim

# vector is 1D row of numbers:
vec = torch.tensor([1, 2])
vec.ndim

# matrix is a 2D grid:
mat = torch.tensor([[1, 2], [3, 4]])
mat.ndim

# tensor is and shape of data that could be represented in 3 or more dimensions:
ten = torch.tensor(range(12))
ten = ten.reshape(2, 3, 2)
ten.ndim

Data BLOBs:

The last tensor we’ve created has a very interesting proprety, first we created the number of elements we want 12 then we reshape it by repspecting 2 rules:
- the total number of elements should be 12 exactly
- these 12 elements should be distributed on 3 dimensions in order to call it tensor
- we decided to ge with (2, 3, 2) but we could go with any distribution as long as we respect the 2 rules.
The 12 represent the data BLOB while the distribution represent the metadata that tells us how the data is shaped.
Data Blob is a large, row chunk of numerical data with no assumed structure untill interpreted, it’s shapeless untill we attach metadata to it.
In the context of Kernel engineering we are not working with well defined tensors shapes, but with:
- pointers to data blobs in memory
- some metadata (shape, strides, dtype)
- a set of indexing rules to access the correct slice
So if we write a tensor:

x = torch.tensor((3, 4, 5))

Under the hood the data is stored in a single flat buffer 60 floats
The shape tells us: This is 3 blocks of 4 rows of 5 elements.

Stride:

In the context of Kernel engineering the Stride is the most important key. Since data is stored in the memory as blobs, stride tells us how many elements to skip in memory to move to the next element along a specific dimension. Think of it as the “memory jump” for each axis.

z = torch.tensor(range(6))
z = z.reshape(3, 2)
z.shape, z.stride()

(torch.Size([3, 2]), (2, 1))

The stride says:
- to move one row: jump 3 elements stride[0]
- to move one column: jumpt one element stride[1]
So in our case the tensor z has a stride of (2, 1):
- 2 is the number of jumps in order to get to the next row
- while 1 is the number of jump to get to the next column

Transposed Stride:

What if we transposed the tensor z? will the stride remain the same?

y = z.t()
z.stride(), y.stride()

((2, 1), (1, 2))

The transpose changed the stride but the data blob remain the same:

z.data_ptr() == y.data_ptr()

True

Stride exercises:

Learn how stride works with some simple Pytorch examples: #### Exercise 1: Basic 2D Tensor Create a 2D tensor and inspect its stride.

# x is a 2D tensor
x =  torch.tensor(range(6)).reshape(2, 3)
# its shape:
x.shape

torch.Size([2, 3])

How to think about its stride?:
- In order to move to the next row how many element should we pass? ==> 3
- In order to get to the next column how many elements we need to jump? ==> 1
So the stride is (3, 1)

x, x.stride()

(tensor([[0, 1, 2],
         [3, 4, 5]]),
 (3, 1))

Exercise 2: Transposed Tensor

Transpose the tensor and observe how the stride changes.

y = x.t()
y, y.shape

(tensor([[0, 3],
         [1, 4],
         [2, 5]]),
 torch.Size([3, 2]))

In this case and since we reversed the shape, its obvious that the stride also will be reversed: (1, 3)
What’s important is that Pytorch doesn’t create new copy of x when trasnposed, it only redefine the way the new tensor is viewed with creating new shape and new stride.

y.stride()

(1, 3)

Exercise 3: Unsqueezed Tensor

Add a new dimension and understand how stride adjusts.

z = x.unsqueeze(0)
x.shape, z.shape

(torch.Size([2, 3]), torch.Size([1, 2, 3]))

What happend here is that Pytorch pretend there’s a new outer dim, so the shape is changed from [2, 3] to [1, 2, 3].
the new dim dim[0] should have stride of 6, because in order to get to a new element in that dim (even that there’s only one element in that dim) we need to pass all other elements in both dimensions [1] and [2], which both contain 2*3 = 6.
RULE: the stride of the new dim is always the product of the inner strides
- so the stride should be: (6, 3, 1)

z.stride()

(6, 3, 1)

d = torch.tensor(range(8)).reshape(2, 4)
d, d.shape

(tensor([[0, 1, 2, 3],
         [4, 5, 6, 7]]),
 torch.Size([2, 4]))

d.stride()

(4, 1)

d1 = d.unsqueeze(1)
d1

tensor([[[0, 1, 2, 3]],

        [[4, 5, 6, 7]]])

d1.shape, d1.stride()

(torch.Size([2, 1, 4]), (4, 4, 1))

Exercise 4: Expanded Tensor

Broadcast a tensor without copying memory.

a = torch.ones(1, 3)
b = a.expand(2, 3)

a.shape, b.shape

(torch.Size([1, 3]), torch.Size([2, 3]))

a.stride()

(3, 1)

Here we have a tensor a of shape [1, 3] then we use expand to make tensor b with shape of [2, 3].
the method expand doesn’t create a new copy of the original tensor rather then virtually expanding a dimension by repeating it without chnaging the memory.
In this case the dim[0] will be virtually repeated 2 times.
In the original tensor a we have a stride of (3, 1):
- In order to get to the next element along dim=0(rows) we have to move 3 steps in memory
- To move to the next element along dim=1 (columns), step by 1 in memory.
Now with tensor b, as we say expand add a virtuall element to the dim=0, it add a new row, but in memory we don’t change anything. So to move to the next row we don’t have to step at all, so the stride at that dimension will be 0.
The other dim=1 remain the same 1

b.stride()

(0, 1)

Exercise 5: Permuted Tensor

Change dimension order and inspect stride layout.

x3 = torch.randn(2, 3, 4)
y3 = x3.permute(2, 0, 1)

x3, x3.shape

(tensor([[[-4.9521e-01, -1.5715e+00,  9.7796e-01, -2.6375e-01],
          [ 1.0992e+00,  4.2912e-01,  7.5855e-02,  1.6052e+00],
          [-7.1012e-01,  7.3460e-01, -3.9331e-01,  1.0008e+00]],
 
         [[ 5.4850e-01, -1.6360e+00,  1.8978e-01, -1.3920e-01],
          [ 1.4362e-01,  4.4029e-01, -2.0576e-01, -2.7227e-01],
          [-1.2247e-03,  1.3967e+00, -5.3473e-01, -7.4465e-01]]]),
 torch.Size([2, 3, 4]))

x3.stride()

(12, 4, 1)

y3, y3.shape

(tensor([[[-4.9521e-01,  1.0992e+00, -7.1012e-01],
          [ 5.4850e-01,  1.4362e-01, -1.2247e-03]],
 
         [[-1.5715e+00,  4.2912e-01,  7.3460e-01],
          [-1.6360e+00,  4.4029e-01,  1.3967e+00]],
 
         [[ 9.7796e-01,  7.5855e-02, -3.9331e-01],
          [ 1.8978e-01, -2.0576e-01, -5.3473e-01]],
 
         [[-2.6375e-01,  1.6052e+00,  1.0008e+00],
          [-1.3920e-01, -2.7227e-01, -7.4465e-01]]]),
 torch.Size([4, 2, 3]))

To move along the new dimension 0 (size 4, originally dim 2), you step by 1 in memory (same as original dim 2).
To move along the new dimension 1 (size 2, originally dim 0), you step by 12 in memory (same as original dim 0).
To move along the new dimension 2 (size 3, originally dim 1), you step by 4 in memory (same as original dim 1).
This shows that permutation changes the order of strides but not their values. The new strides correspond to the original strides in the permuted order.

View:

In PyTorch, viewing a tensor refers to creating a new tensor that shares the same underlying data storage as the original tensor but with a different shape, stride, or metadata. This means the viewed tensor does not copy the data; instead, it provides an alternative way to interpret the existing data in memory. 1- Memory:
View allow tensors to share memory.
Modifying the viewed tensor will modifies the original tensor. 2- Shape and Stride adjustement:
As we saw earlier view can reinterpret a tensor shape end stride without copying it or changing the memory. 3- Zero-Cost Operation:
Viewing is efficient because it does not allocate new memory or copy data.
Operations like view(), transpose(), permute(), expand(), and slicing often return views.

Broadcasting:

Broadcasting automatically expands smaller tensors to match the shape of larger tensors for element-wise operations by following specific rules:
- Tensors are aligned from rights to left
- if sizes are equal then they are compatible
- If one tensor size is 1, it’s streched to match the other
- If one tensor is missing a dimension, it’s treated like size 1 dimension (then streched to match)

# adding vector to scalar
vec = torch.tensor([1, 2, 3])
scal = torch.tensor(5)
out =  vec + scal

print(vec, vec.shape)
print(scal, scal.shape)
print(out, out.shape)

tensor([1, 2, 3]) torch.Size([3])
tensor(5) torch.Size([])
tensor([6, 7, 8]) torch.Size([3])

# tensor size 1
A = torch.tensor([[1, 2], [3, 4]])  # Shape (2, 2)
B = torch.tensor([[10, 20]])         # Shape (1, 2)
C = A + B

print(A, A.shape)
print(B, B.shape)
print(C, C.shape)

tensor([[1, 2],
        [3, 4]]) torch.Size([2, 2])
tensor([[10, 20]]) torch.Size([1, 2])
tensor([[11, 22],
        [13, 24]]) torch.Size([2, 2])

# tensor missing a dimension:
D = torch.tensor([[1, 2], [3, 4]])  # Shape (2, 2)
R = torch.tensor([10, 20])          # Shape (2,)
S = D + R

print(D, D.shape)
print(R, R.shape)
print(S, S.shape)

tensor([[1, 2],
        [3, 4]]) torch.Size([2, 2])
tensor([10, 20]) torch.Size([2])
tensor([[11, 22],
        [13, 24]]) torch.Size([2, 2])